NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

From Fake to Real: Pretraining on Balanced Synthetic Images to Prevent Spurious Correlations in Image Recognition

Qraitem, Maan; Saenko, Kate; Plummer, Bryan A (September 2024, European Conference on Computer Vision)

Visual recognition models are prone to learning spurious correlations induced by a biased training set where certain conditions (eg, Indoors) are over-represented in certain classes (eg, Big Dogs). Synthetic data from off-the-shelf large-scale generative models offers a promising direction to mitigate this issue by augmenting underrepresented subgroups in the real dataset. However, by using a mixed distribution of real and synthetic data, we introduce another source of bias due to distributional differences between synthetic and real data (eg synthetic artifacts). As we will show, prior work's approach for using synthetic data to resolve the model's bias toward do not correct the model's bias toward the pair , where denotes whether the sample is real or synthetic. Thus, the model could simply learn signals based on the pair (eg, Synthetic Indoors) to make predictions about (eg, Big Dogs). To address this issue, we propose a simple, easy-to-implement, two-step training pipeline that we call From Fake to Real (FFR). The first step of FFR pre-trains a model on balanced synthetic data to learn robust representations across subgroups. In the second step, FFR fine-tunes the model on real data using ERM or common loss-based bias mitigation methods. By training on real and synthetic data separately, FFR does not expose the model to the statistical differences between real and synthetic data and thus avoids the issue of bias toward the pair . Our experiments show that FFR improves worst group accuracy over the state-of-the-art by up to 20% over three datasets.
more » « less
Full Text Available
Bias Mimicking: A Simple Sampling Approach for Bias Mitigation

Qraitem, Maan; Saenko, Kate; Plummer, Bryan A (July 2023, IEEE Computer Society Conference on Computer Vision and Pattern Recognition)

Full Text Available
A Dataset for Interactive Vision-Language Navigation with Unknown Command Feasibility

https://doi.org/10.1007/978-3-031-20074-8_18

Burns, Andrea; Arsan, Deniz; Agrawal, Sanjna; Kumar, Ranjitha; Saenko, Kate; Plummer, Bryan A. (November 2022, The European Conference on Computer Vision)

Full Text Available
Regularizing Action Policies for Smooth Control with Reinforcement Learning

https://doi.org/10.1109/ICRA48506.2021.9561138

Mysore, Siddharth; Mabsout, Bassel; Mancuso, Renato; Saenko, Kate (May 2021, ICRA)

Full Text Available
Extending the WILDS Benchmark for Unsupervised Adaptation

Sagawa, Shiori; Koh, Pang Wei; Lee, Tony; Gao, Irene; Xie, Sang Michael; Shen, Kendrick; Kumar, Ananya; Hu, Weihua; Yasunaga, Michihiro; Marklund, Henrik; et al (January 2022, International Conference on Learning Representations (ICLR))

Machine learning systems deployed in the wild are often trained on a source distribution but deployed on a different target distribution. Unlabeled data can be a powerful point of leverage for mitigating these distribution shifts, as it is frequently much more available than labeled data and can often be obtained from distributions beyond the source distribution as well. However, existing distribution shift benchmarks with unlabeled data do not reflect the breadth of scenarios that arise in real-world applications. In this work, we present the WILDS 2.0 update, which extends 8 of the 10 datasets in the WILDS benchmark of distribution shifts to include curated unlabeled data that would be realistically obtainable in deployment. These datasets span a wide range of applications (from histology to wildlife conservation), tasks (classification, regression, and detection), and modalities (photos, satellite images, microscope slides, text, molecular graphs). The update maintains consistency with the original WILDS benchmark by using identical labeled training, validation, and test sets, as well as identical evaluation metrics. We systematically benchmark state-of-the-art methods that use unlabeled data, including domain-invariant, self-training, and self-supervised methods, and show that their success on WILDS is limited. To facilitate method development, we provide an open-source package that automates data loading and contains the model architectures and methods used in this paper.
more » « less
Full Text Available
Extending the WILDS Benchmark for Unsupervised Adaptation

Sagawa, Shiori; Koh, Pang Wei; Lee, Tony; Gao, Irena; Xie, Sang Michael; Shen, Kendrick; Kumar, Ananya; Hu, Weihua; Yasunaga, Michihiro; Marklund, H.; et al (January 2022, International Conference on Learning Representations)

Machine learning systems deployed in the wild are often trained on a source distribution but deployed on a different target distribution. Unlabeled data can be a powerful point of leverage for mitigating these distribution shifts, as it is frequently much more available than labeled data and can often be obtained from distributions beyond the source distribution as well. However, existing distribution shift benchmarks with unlabeled data do not reflect the breadth of scenarios that arise in real-world applications. In this work, we present the WILDS 2.0 update, which extends 8 of the 10 datasets in the WILDS benchmark of distribution shifts to include curated unlabeled data that would be realistically obtainable in deployment. These datasets span a wide range of applications (from histology to wildlife conservation), tasks (classification, regression, and detection), and modalities (photos, satellite images, microscope slides, text, molecular graphs). The update maintains consistency with the original WILDS benchmark by using identical labeled training, validation, and test sets, as well as the evaluation metrics. On these datasets, we systematically benchmark state-of-the-art methods that leverage unlabeled data, including domain-invariant, self-training, and self-supervised methods, and show that their success on WILDS is limited. To facilitate method development and evaluation, we provide an open-source package that automates data loading and contains all of the model architectures and methods used in this paper. Code and leaderboards are available at this https URL.
more » « less
Full Text Available
Learning multi-level hierarchies with hindsight

Levy, Andrew; Konidaris, George; Platt, Robert; Saenko, Kate (January 2019, Proceedings of International Conference on Learning Representations)

Full Text Available
Adapting control policies from simulation to reality using a pairwise loss

Viereck, Ulrich; Saenko, Kate; Platt, Robert (January 2018, Proceedings of 2018 International Symposium on Experimental Robotics (ISER 2018))

Full Text Available
Adapting control policies from simulation to reality using a pairwise loss

Viereck, Ulrich; Saenko, Kate; Platt, Robert (January 2018, International Symposium on Experimental Robotics)

This paper proposes an approach to domain transfer based on a pairwise loss function that helps transfer control policies learned in simulation onto a real robot. We explore the idea in the context of a “category level” manipulation task where a control policy is learned that enables a robot to perform a mating task involving novel objects. We explore the case where depth images are used as the main form of sensor input. Our experimental results demonstrate that proposed method consistently outperforms baseline methods that train only in simulation or that combine real and simulated data in a naive way
more » « less
Full Text Available
Grasp Pose Detection in Point Clouds

https://doi.org/10.1177/0278364917735594

ten Pas, Andreas; Gualtieri, Marcus; Saenko, Kate; Platt, Robert (July 2017, The International Journal of Robotics Research)

Full Text Available

« Prev Next »

Search for: All records